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Abstract 

Background: Automated, image based high-content screening is a fundamental tool for discovery in biological 
science. Modern robotic fluorescence microscopes are able to capture thousands of images from massively parallel 
experiments such as RNA interference (RNAi) or small-molecule screens. As such, efficient computational methods 
are required for automatic cellular phenotype identification capable of dealing with large image data sets. In this 
paper we investigated an efficient method for the extraction of quantitative features from images by combining 
second order statistics, or Haralick features, with curvelet transform. A random subspace based classifier ensemble 
with multiple layer perceptron (MLP) as the base classifier was then exploited for classification. Haralick features 
estimate image properties related to second-order statistics based on the grey level co-occurrence matrix (GLCM), 
which has been extensively used for various image processing applications. The curvelet transform has a more 
sparse representation of the image than wavelet, thus offering a description with higher time frequency resolution 
and high degree of directionality and anisotropy, which is particularly appropriate for many images rich with edges 
and curves. A combined feature description from Haralick feature and curvelet transform can further increase the 
accuracy of classification by taking their complementary information. We then investigate the applicability of the 
random subspace (RS) ensemble method for phenotype classification based on microscopy images. A base 
classifier is trained with a RS sampled subset of the original feature set and the ensemble assigns a class label by 
majority voting. 

Results: Experimental results on the phenotype recognition from three benchmarking image sets including HeLa, 
CHO and RNAi show the effectiveness of the proposed approach. The combined feature is better than any 
individual one in the classification accuracy. The ensemble model produces better classification performance 
compared to the component neural networks trained. For the three images sets HeLa, CHO and RNAi, the Random 
Subspace Ensembles offers the classification rates 91.20%, 98.86% and 91.03% respectively, which compares sharply 
with the published result 84%, 93% and 82% from a multi-purpose image classifier WND-CHARM which applied 
wavelet transforms and other feature extraction methods. We investigated the problem of estimation of ensemble 
parameters and found that satisfactory performance improvement could be brought by a relative medium 
dimensionality of feature subsets and small ensemble size. 

Conclusions: The characteristics of curvelet transform of being multiscale and multidirectional suit the description 
of microscopy images very well. It is empirically demonstrated that the curvelet-based feature is clearly preferred to 
wavelet-based feature for bioimage descriptions. The random subspace ensemble of MLPs is much better than a 
number of commonly applied multi-class classifiers in the investigated application of phenotype recognition. 
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Background 

Complex cellular structures such as molecular construc- 
tion of a cell can be studied by fluorescence microscopy 
images of cells with appropriate stains. Robotic systems 
nowadays can automatically acquire thousands of images 
from cell assays, which are often referred as being "high- 
content" for the large amount of information. These 
images reflect the biological properties of the cell with 
many features, including size, shape, amount of fluores- 
cent label, DNA content, cell cycle, and cell morphol- 
ogy. With interdisciplinary efforts from computer 
science and biology, scientists are now able to carry out 
large-scale screening of cellular phenotypes, at whole- 
cell or sub-cellular levels, which are important in many 
applications, e.g., delineating cellular pathways, drug tar- 
get validation and even cancer diagnosis [1,2]. 

From the high-throughput screening, biologists can 
also greatly benefit in further understanding the complex 
cellular processes and genetic functions [3,4]. For exam- 
ple, a gene's normal operations in the cell can be assessed 
by observing the downstream effect of perturbing gene 
expression [5]. By introduction of double-stranded RNA 
(dsRNA) into a diverse range of organisms and cell types, 
the complementary mRNA can be degraded, a phenom- 
enon known as RNA interference (RNAi) [6,7]. The dis- 
covery of RNAi and the availability of whole genome 
sequences allow the systematic knockdown of every gene 
or specific gene sets in a genome [8-10]. Image-based 
screening of the entire genome for specific cellular func- 
tions thus becomes feasible by the development of Dro- 
sophila RNAi technology to systematically disrupt gene 
expression [11,12]. Genome-wide screens, however, pro- 
duce huge volumes of image data which is beyond 
human's capability of manual analysis, and automating 
the analysis of the large number of images generated in 
such screening is the bottleneck in realizing the full 
potential of cellular and molecular imaging studies. 

To advance the development of high content screen- 
ing (HCS) for genome analysis, computer vision and 
pattern analysis techniques have to be resorted to char- 
acterize morphological phenotypes quantitatively and to 
identify genes and their dynamic relationships on a gen- 
ome-wide scale [4,13]. Such a bio-image informatics fra- 
mework would consist of several components: cellular 
segmentation, cellular morphology and image feature 
extraction, cellular phenotype classification, and cluster- 
ing analysis [11,14]. In this article, our effort is made 
toward further investigating the challenging multi-class 
phenotype classification problem from microscopy 
images by using benchmarking fluorescence microscopy 
images [15-17]. 

With appropriate cellular segmentation results, pheno- 
type recognition can be studied in a multi-class classifi- 
cation framework, which involves two interweaved 



components: feature representation and classification. 
Efficient and discriminative image representation is a 
fundamental issue in any bioimage recognition tasks. 
Most of the proposed approaches for image-based high- 
content screening employed feature set which consist of 
different combinations of morphological, edge, texture, 
geometric, moment and wavelet features [15,18-22]. In 
recent years, computer science has seen much pro- 
gresses in the development of various efficient image 
feature description methods, many of which have 
become "off-the-shelf standard techniques applicable to 
bioimages analysis. In this paper, we will show that the 
curvelet transform [23-27] outperform many other 
known feature descriptions for the cellular images. 
Based on the latest research achievements on multireso- 
lution analysis for image, Curvelet Transform can accu- 
rately capture edge information by taking the form of 
basis elements which exhibit very high directional sensi- 
tivity and are highly anisotropic. It has been shown that 
curvelet is well suited for representing images which are 
rich of edge information and the efficiency has been 
demonstrated in many tasks [28-31]. 

To our knowledge, there is no previous work discuss- 
ing the use of Curvelet transform in fluorescence micro- 
scope images except [32] which applied curvelet for 
denoising. The research presented in this paper is to 
investigate the application of curvelet on microscope 
images based phenotype recognition, and compare it 
with the application of wavelets. The simple statistics of 
mean and standard deviation from multiscale curvelet 
transform coefficients are extracted and evaluated as the 
basic curvelet features. On the other hand, due to the 
proven effectiveness of the traditional Haralick features 
[33] for extracting texture information, and in micro- 
scope images of biological cells in particular [18,19], we 
proposed to form a combined image description from 
the Curvelet Transform and Haralick features. Haralick 
feature is based on gray-level spatial dependencies using 
a Gray Level Co-occurrence Matrix (GLCM) that mea- 
sures the frequency that a particular gray level is found 
adjacent to another gray level. By complementarily com- 
bining the advantages of each feature description 
method, the classification performance can be consider- 
ably enhanced. 

Machine learning methods such as artificial neural 
networks and Support Vector Machine (SVM) have 
been utilized for the classification of subcellular protein 
location patterns with fluorescence microscope images 
[18-21,34]. Multi-class phenotype images, however, are 
often featured with large intra- class variations and inter- 
class similarities, which poses serious problems for 
simultaneous multi-class separation using the standard 
classifiers. And other rate-limiting factors challenging 
classifier design is that the dimension of the feature 



Zhang and Pham BMC Bioinformatics 2011, 12:128 
http://www.biomedcentral.eom/1 471 -21 05/1 2/1 28 



Page 3 of 1 3 



space is often larger than the number of training exam- 
ples per class. In recent years, many sophisticated learn- 
ing paradigms have been proposed to design accurate 
and robust classification systems. Among them, ensem- 
ble learning has attracted much attention due to the 
promising results for many applications. An ensemble of 
classifiers integrates multiple base classifiers such as 
multiple layer perceptrons (MLP) using a same learning 
algorithm [35,36]. Among the representatives of ensem- 
ble learning, Random Subspace method (RSM) [37] is 
an efficient way to create ensemble of classifiers. RSM 
divides the entire space of features into subspaces and 
each subspace is formed by randomly picking features 
from the entire space, allowing for features to be 
repeated across subspaces. In this paper, we propose to 
construct and evaluate a Random Subspace classifier 
ensemble with multiple layer perceptron as the base 
classifier, using the combined features from Curvelet 
Transform and Haralick features. 

Results 

Three image sets HeLa, CHO and RNAi from the IICBU 
benchmarking data [15,16] were evaluated. All of the 
images are processed as wholes without any detection or 
segmentation. In our experiments, a simple holdout 
methodology was applied. For each of the three image 
sets, we randomly split it into training and test sets, each 
time with 20% of each class's images reserved for testing 
while the rest for training. The classification results are 
the average from 100 runs, such that each run used a 
random split of the data to training and test sets. 

Our first set of experiments aims at selecting appro- 
priate features from the curvelet transform. The simple 
statistics of mean and standard deviation from each 
band of curvelet transform have been used as efficient 
signatures in several applications [29-31]. Rather than 
theoretical analysis of the distribution properties of Cur- 
velet Transform coefficients for the cell images, we con- 
ducted empirical analysis for the usefulness to include 
other statistics into the feature vectors, for example, 
energy, entropy, skewness and kurtosis. To determine 
the discriminating power from different combinations of 
these statistics as the image features, we compared the 
classification performance from the following candidate 
curvelet feature vectors. In addition to the basic signa- 
ture of mean and standard deviation (basic), other sets 
of statistics were calculated and added to the basic sig- 
nature in turn, including norm, energy, variance, skew- 
ness, kurtosis, and entropy. The results were reported in 
Figure 1. While the addition of variance or energy 
slightly increase the classification accuracy for the HeLa 
images, the inclusion of entropy clearly outperform 
other kind of combinations for both of the CHO and 
RNAi images. 



The discriminating strengths of the feature vectors 
from GLCM [33], Daubechies wavelet [38], Gabor wave- 
let [39,40] and Curvelet transform [23] were compared 
using three-layer perceptron (MLP) neural network, 
which has the number of inputs same as the number of 
features, one hidden layer with 20 units and linear units 
representing the class labels (10 for HeLa and RNAi, 5 
for CHO). The networks are trained using the Conju- 
gate Gradient learning algorithm for 500 epochs. Figure 
2 illustrates the comparison of accuracies for each of 
the three image sets from GLCM, Daubechies wavelet, 
Gabor wavelet, and curvelet feature descriptors. For all 
of the three image sets, the curvelet descriptors clearly 
outperforms all other descriptors. Specifically, for HeLa, 
CHO and RNAi images, the classification accuracies 
from Curvelet features are 86.1%, 96.4% and 87.3% 
respectively, which compares favorably over the results 
from GLCM features 83.3%, 95.2%, and 86%. Among 
the four different features compared, the curvelet fea- 
tures also consistently demonstrates the superiority over 
Gabor wavelet and Daubechies wavelet, which were in 
fact expected since the curvelet transform is able to cap- 
ture the multi-dimensional features in wedges as 
opposed to points in wavelet transforms. The multidir- 
ectional features in curvelets prove to be very effective 
in the classification of microscopy cell images which 
often demonstrate piece-wise smooth with rich edge 
information. The best classification accuracies for all of 
the three image sets were from the aggregated features 
by simply concatenating curvelet and GLCM, namely, 
HeLa 89.2%, CHO 98.2% and RNAi 89.9% respectively. 
The benefit of applying them in an integrative way is 
due to the fact that GLCM and curvelet transform fea- 
tures emphasize texture and edge or piece-wise smooth 
characteristics of image differently. As expected, the 
combination of both features from GLCM and curvelet 
transform produces a higher accuracy than the methods 
being used singly, as shown in Figure 2. 

We proceeded to evaluate several different but com- 
monly used supervised learning methods to the multi- 
class phenotype classification problem, including /c-near- 
est neighbors (/cNN), multi-layer perceptron neural net- 
works, SVM and random forest, using the above three 
image sets. The feature vector for each image is calcu- 
lated from the sub-bands of curvelet transform (Basic + 
Variance for HeLa, Basic + Entropy for CHO and 
RNAi). We simply chose k = 1 for /cNN in the compari- 
sons. MLP is the same as used in the above experiment. 
Designing SVM classifiers [41] includes selecting the 
proper kernel function and choosing the appropriate 
kernel parameters and C value. The popular library for 
SVM, LIBSVM http://www.csie.ntu.edu.tw/~cjlin/libsvm, 
was used in the experiment. We used the radial basis 
function kernel for the SVM classifier. The parameter y 
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Figure 1 Comparison of the performance for different curvelet-based feature vectors for phenotype classifications from HeLa, CHO 
and RNAi images. 



that defines the spread of the radial function was set to 
be 5.0 and parameter C that defines the trade-off 
between the classifier accuracy and the margin (the gen- 
eration) to be 3.0. A random forest (RF) classifier [42] 
consists of many decision trees and outputs the class 
that is the mode of the classes output by individual 
trees. In the comparison experiments, the number of 
trees for random forest classifier was chosen as 300 and 
the number of variables to be randomly selected from 
the available set of variables was selected as 20. 



The comparison results from applying the above four 
classifiers were provided in Figure 3, which confirmed 
that for each image dataset, the best result was obtained 
by using MLP. For RNAi, the result from MLP is 89.9%, 
which is better than the published result 82% [15]. The 
accuracies from other three classifiers are 71.6% (/cNN), 
70.1% (random forest), and 77.5% (SVM). For 2D-Hela 
and CHO, the correct classification rates from MLPs are 
89.2% and 98.4%, respectively, which are also very com- 
petitive. The results for these two datasets obtained by 




Figure 2 Comparison of the Daubechies wavelet-based, Gabor-wavelet based, GLCM-based and curvelet-based feature descriptors for 
phenotype classifications from HeLa, CHO and RNAi images. 
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Shamir et al. are 84% for 2D-Hela and 93% for CHO 
[15]. The results obtained by MLP contrast to the gen- 
erally accepted perception that SVM classifier is better 
than neural network in classification. The most reason- 
able explanation for the better performance of MLP 
from our experiment is that MLP as a memory-based 
classifier is more resistant to insufficient data amount 
comparing the margin or distance-based SVM. 

Our next experimental study aims to demonstrate the 
superiority of Random Subspace MLP classifier ensem- 
ble over the single MLP used in the previous experi- 
ments. The holdout experiment setting is similar. To 
ensure the diversity among the MLPs in an ensemble, 
the numbers of hidden units in the component networks 
are varied by randomly choosing them from a range of 
30 -50. Using an ensemble of size 20 and 80% of origi- 
nal dimensionality for feature subsets, the classification 
results obtained for HeLa, CHO and RNAi were sum- 
marized in Table 1, from which one can see that for all 
the three image sets, the random subspace MLP ensem- 
ble does bring the improvement on the classification 
accuracy. 

In the Random Subspace method, there are two 
important parameters that have to be considered. The 
first is the ensemble size L, which is the number of base 

Table 1 Performance Improvement from Random 



Subspace Ensemble (RSE) 



Classifier 


RNAi 


2D-Hela 


CHO 


MLP 


89.90% 


89.20% 


98.02% 


MLP-RSE (ensemble size = 20) 


90.10% 


90.65% 


98.40% 



classifiers in the ensemble. The second is the dimen- 
sionality of feature subsets M. Recently, the selection of 
these two parameters has been addressed in the classifi- 
cation of brain images of fMRI [36] and text categoriza- 
tion [43], which shows that relatively medium M and 
small L yield an ensemble that could improve the per- 
formance. Our next experiment assessed the effect of 
ensemble size for the phenotype image classifications. 
We first varied the sizes of the ensembles from 5 com- 
ponents MLPs to 50, with fixed feature subspace dimen- 
sionality 350, which account for 80% of combined 
features' dimension. The same experiment procedure 
described above was repeated. The results of the aver- 
aged classification accuracies in Table 2 shows that 
there are no real benefits of forming very large ensem- 
bles. In the case of RNAi, the maximum accuracy 
91.03% is reached with size 5, beyond which there is no 
improvement in performance. In the cases of HeLa and 
CHO, size 15 seems to be sufficient for the much 
increased classification accuracies comparing to the 
results in [15]. Beyond size 15, larger ensemble size may 
bring quite marginal improvement for the CHO images 
at the cost of heavy computational burden in the train- 
ing phase. 

To answer the question how the dimensionality of fea- 
ture subsets M influence the classification performance 
of RSM, we compared the RSM classification perfor- 
mances for the three image sets by varying subspace 
dimensionalities. For original 438 dimensional combined 
features, selection of 55% features means 241 dimen- 
sionality and so on. With 40 component MLPs in the 
ensemble, the comparison results are given in Table 3, 
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Table 2 Comparison of Classification Performance of Varying Ensemble Sizes 



Size 


5 


10 


15 


20 


25 


30 


35 


40 


45 


50 


RNAi 


91.03% 


90.37% 


90.05% 


90.10% 


89.62% 


90.05% 


89.38% 


90.34% 


89.27% 


89.50% 


HeLa 


89.99% 


90.52% 


90.94% 


90.65% 


90.59% 


90.45% 


90.90% 


91.05% 


90.38 


90.07% 


CHO 


98.34% 


98.34% 


98.52% 


98.40% 


98.34% 


98.74% 


98.60% 


98.64% 


98.46% 


98.46% 



which indicates that satisfactory results are obtained 
with medium Ms. For the RNAi data, maximum 90.34% 
classification rate is reached with selection of 80% of 
original features (M = 350). For the HeLa and CHO 
image sets, the largest accuracies 91.20% and 98.86% are 
from selecting 75% and 85% of original features (M = 
328 and M = 372), respectively. 

The confusion matrices that summarize the details of 
the above random subspace ensemble for the three 
image sets are given in the following Tables 4, 5 and 6. 
In the confusion matrix representation, the rows and 
columns indicate true and predicted class, respectively. 
The diagonal entries represent correct classification 
while the off-diagonal entries represent incorrect ones. 
Confusion matrix is often used in the study of multi- 
class classification problems and to measure the similar- 
ity between classes of phenotypes [44]. For the HeLa 
images, the RS ensemble can well distinguish two Golgi 
proteins, GPP130 and giantin, which have been shown 
to be very difficult to discriminate by visual inspection 
[45]. For the RNAi image sets, it is apparent that among 
the 10 classes, CG 10873 type is the easiest to be cor- 
rectly classified while the CG9484 is the most difficult 
one. On the other hand, the resulting confused genes do 
not directly share the related biological mechanisms. 
Each of these genes is associated with a different 
mechanism, but some pathways can be more similar to 
each others than others. For example, the gene CG8114 
(Pebble), which leads to binucleate phenotype, is some- 
times confused with gene CG3938 (CyclinE), which is 
associated with Gl arrest. These two genes are more 
related to each other than other pairs and the confusion 
is often expected. However, the results from Random 
Subspace ensemble indicate that these two genes can be 
well distinguished without any confusion. 

Discussion 

Accurately and robustly classification of cellular pheno- 
types is still a challenging task in image based high- 



content screening. To find the best description for the 
microscopy images, many features extraction methods 
have been attempted in previous studies, but all encoun- 
ter with various aspects of difficulty in dealing with var- 
ious irregularities in the cellular morphology. 

Our motivation for overcoming this challenge is two- 
fold. Firstly, we have attempted to demonstrate that the 
classification can be improved by using highly discrimi- 
native image features. This can be achieved by the utili- 
zation of the curvelet transform to extract such a 
feature. Being similar to various wavelet transforms, the 
curvelet transform can capture the structural informa- 
tion of the images in this study at multiple scales, loca- 
tions, and orientations. The major advantage is that the 
curvelet transform can identify the structural detail 
along the radial 'wedges' in the frequency domain, 
which is inherent in images of rich edge information. 
Secondly, the notion of combination of multiple com- 
plementary features can further improve the classifica- 
tion performance. The combination of features for 
pattern classification has recently been embraced by the 
research community of image processing and pattern 
recognition [46]. However, most proposed methods for 
feature combination are based on optimization techni- 
ques or machine learning algorithms, which are often 
very complicated for practical implementation. Our pro- 
posed method offers a much simpler solution by trans- 
forming multiple features into a single representation 
based on which the random subspace classifier ensemble 
can efficiently integrate different aspects of information 
from various random subspaces. By using this strategy, 
the high dimensionality problem arising from using the 
conventional combination of multiple features is there- 
fore implicitly avoided. 

In general, the method we have proposed in this paper 
can be applied to classifying and understanding complex 
patterns of many biological systems. The gaining of 
such knowledge can quickly provide life-science 
researchers with new insights into the changing 



Table 3 Comparison of Classification Performance from Random Subspace Ensemble by Varying the Subspace 
Dimensionalities 



Dimensionalities(%) 


55 


60 


65 


70 


75 


80 


85 


90 


95 


100 


RNAi 


89.34% 


90.10% 


90.20% 


90.60% 


89.90% 


90.34% 


90.20% 


89.50% 


89.34% 


89.74% 


HeLa 


90.49% 


90.23% 


90.52% 


90.57% 


91.15% 


90.85% 


91.00% 


90.51% 


90.66% 


90.70% 


CHO 


98.58% 


98.68% 


98.50% 


98.46% 


98.64% 


98.64% 


98.86% 


98.30% 


98.50% 


98.80% 
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Table 4 Averaged confusion matrix for RNAi (%) 
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CGI 0873 


100 


0 


0 


0 


0 


0 


0 


0 


0 


0 


CG1258 


0 


90.39 


0 


0 


0 


0 


9.61 


0 


0 


0 


CG3733 


0 


0 


98.76 


0 


0 


0 


0.99 


0.25 


0 


0 


CG7922 


0 


0 


0 


98.36 


0 


0 


0 


1.64 


0 


0 


CG8222 


0 


5.04 


0 


0 


89.12 


0 


1.59 


0.27 


0 


3.98 


CG 12284 


0 


2.42 


0 


0 


0.24 


86.96 


2.42 


1.21 


3.86 


2.90 


CG17161 


0 


9.28 


1.59 


0 


0.27 


0 


88.86 


0 


0 


0 


CG3938 


1.90 


0 


0 


6.50 


1.90 


0.27 


0 


89.43 


0 


0 


CG8114 


0 


0 


0 


0 


0 


1.11 


0 


0 


98.89 


0 


CG9484 


3.40 


0 


0.24 


3.88 


12.14 


8.74 


0 


0 


1.7 


69.90 



biological behaviors in various treatment conditions, and 
to facilitate rapid screening and testing of new therapeu- 
tic interventions for major public health problems. 

Conclusion 

This paper further investigated the challenging multi- 
class phenotype classification problem from microscopy 
images. Two contributions are presented. Firstly, we 
proposed to combine two different feaures from Gray 
Level Co-occurrence Matrix (GLCM) and curvelet trans- 
form to efficiently describe microscopy images. Sec- 
ondly, we examined random subspace classifier 
ensemble with multi-layer perceptron as the base classi- 
fier, which seem to be well-suited to the characteristics 
of microscopy images, as the high dimensionality of the 
data could be implicitly solved by randomly selecting 
subsets of features. Experiments on three benchmarking 
microscopy image datasets showed that the random sub- 
space MLP ensemble method achieved significantly 
higher classification accuracies (91% for RNAi, 91.2% for 
HeLa and 98.9% for CHO), compared to the published 
results (82% for RNAi, 84% for HeLa and 93% for 
CHO), which used wavelet as part of the features and a 
general-purpose classification scheme WND-CHARM. 
This supports the claim that Random Subspace ensem- 
ble can be used as a simple yet efficient classifier design 



methodolody and curvelet features effectively measure 
the informativeness in the microscopy images. 

Methods 

The Benchmarking Fluorescence Microscopy Image 
Datasets 

Three benchmark fluorescence microscopy image data- 
sets in [15,16] were used in our study, which are 2D- 
Hela, CHO and RNAi. The 2D HeLa dataset is a collec- 
tions of HeLa cell immunofluorescence images contain- 
ing 10 distinct subcellular location patterns. The 10 
organelles from the images are DNA (Nuclei), ER 
(Endoplasmic reticulum), Giantin, (cis/medial Golgi), 
GPP130 (cis Golgi), Lamp2 (Lysosomes), Mitochondria, 
Nucleolin (Nucleoli), Actin, TfR (Endosomes), Tubulin. 
CHO is a dataset of fluorescence microscope images of 
CHO (Chinese Hamster Ovary) cells. The images were 
taken using 5 different labels. The labels are: anti-gian- 
tin, Hoechst 33258 (DNA), anti-lamp2, anti-nop4, and 
anti-tubulin. 

The RNAi dataset is a set of fluorescence microscopy 
images of fly cells (D. melanogaster) subjected to a set 
of gene-knockdowns using RNAi. The cells are stained 
with DAPI to visualize their nuclei. Each class contains 
20 1024 x 1024 images of the phenotypes resulting from 
knockdown of a particular gene. Ten genes were 



Table 5 Averaged confusion matrix for HeLa (%) 





Actin 


TfR 


ER 


Giantin 


GPP130 


LAMP2 


Tubulin 


Mitoch. 


Nucleolin 


DNA 


Actin 


100 


0 


0 


0 


0 


0 


0 


0 


0 


0 


TfR 


0 


80.36 


1.25 


0 


0.23 


10.78 


3.75 


3.63 


0 


0 


ER 


0 


0 


93.81 


0 


0 


0 


0.80 


3.78 


0 


1.61 


Giantin 


0 


0 


0 


95.56 


4.21 


0 


0 


0.23 


0 


0 


GPP130 


0 


0.24 


0.84 


10.79 


87.17 


0.12 


0 


0 


0.84 


0 


LAMP2 


0 


7.54 


0 


0.75 


1.88 


86.43 


0 


1.63 


1.76 


0 


Tubulin 


0.34 


5.82 


8.73 


0 


1.90 


0 


80.40 


2.69 


0 


0.11 


Mitoch. 


0 


4.04 


3.35 


0 


1.81 


2.09 


0.42 


88.15 


0 


0.14 


Nucleolin 


0 


0 


0 


0.48 


0.12 


0 


0 


0.12 


99.28 


0 


DNA 


0 


0 


1.61 


0 


0 


0 


0 


0 


0 


98.39 
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Table 6 Averaged confusion matrix for CHO (%) 





anti-giantin 


DNA 


anti-lamp2 


anti-nop4 


anti-tubulin 


anti-giantin 


99.73 


0 


0 


0 


0.27 


DNA 


0 


100 


0 


0 


0 


anti-lamp2 


0 


0 


100 


0 


0 


anti-nop4 


0 


0 


0 


98.34 


1.66 


anti-tubulin 


1.70 


0 


0 


4.67 


93.63 



selected, and their gene IDs are used as class names. 
The genes are CG1258, CG3733, CG3938, CG7922, 
CG8114, CG8222, CG 9484, CG10873, CG12284, 
CG17161 [15,16]. According to [15,16], the images were 
acquired automatically using a Delta- Vision light micro- 
scope with a 60x objective. Each image is produced by 
deconvolution, followed by maximum intensity projec- 
tion (MIP) of a stack of 11 images at different focal 
planes. 

Samples of the above three image sets are illustrated 
in Figures 4, 5 and 6, respectively. 

Image Feature Extraction 

Once the cell images are segmented, various feature 
extraction methods can be applied. In our study, we cal- 
culated texture features using the co-occurrence 
matrices (Haralick features) and applied three multi- 
resolution transforms, including Daubechies wavelet, 
Gabor wavelet and curvelet. 
Gray Level Co-occurrence Matrices 

Gray level co-occurrence matrix (GLCM) proposed by 
Haralick [33] is a common texture analysis method 
which estimates image properties related to second- 
order statistics. GLCM matrix is defined over an image 
to be the distribution of co-occurring values at a given 
offset. Mathematically, a co-occurrence matrix C is 
defined over an n x m image /, parameterized by an 



offset (Ax, Ay) as 

n m 

C A x,Ay(w) = 

p=l 4=1 



1, if I{p,cj) = / and J(p + Ax,q + Ay) = j 
0, otherwise 



(i) 



Note that the (Ax, Ay) parameterization makes the co- 
occurrence matrix sensitive to rotation. An offset vector 
can be chosen such that a rotation of the image not 
equal to 180 degrees will result in a different co-occur- 
rence distribution for the same image. 

In order to estimate the similarity between different 
GLCM matrices, many statistical features can be 
extracted from them. The most relevant features that 
are widely used in literature include: (1). Energy, 
which is a measure of textural uniformity of an image 
and reaches its highest value when gray level distribu- 
tion has either a constant or a periodic form; (2). 
Entropy, which measures the disorder of an image 
and achieves its largest value when all elements in C 
matrix are equal; (3). Contrast, which is a difference 
moment of the C and measures the amount of local 
variations in an image. In addition to these standard 
features, we also calculated the following features 
[33,47,48] derivable from a normalized co-occurrence 
matrix: 

♦ Correlation 

♦ Cluster Prominence 

♦ Cluster Shade 

♦ Homogeneity 

♦ Sum of sqaures 

♦ Sum variance 

♦ Sum entropy 

♦ Difference variance 

♦ Inverse difference (INV) 

♦ Inverse difference normalized (INN) 




Zhang and Pham BMC Bioinformatics 2011, 12:128 
http://www.biomedcentral.eom/1 471 -21 05/1 2/1 28 



Page 9 of 1 3 





The details of these textbook materials are not 
included here as they can be found in many resources, 
for example [47,48]. 

For GLCM feature case, 16 gray co-occurrence 
matrices were created for each image with an offset that 
specifies four orientations 0, 7z74, tt/2 and 3tt/4 and 4 
distances (1,2,3 and 4 pixels) for each direction. Then 
for each normalized co-occurrence matrix P (i, /), 12 dif- 
ferent type of statistic measurements were estimated, 
including correlation, variance, contrast, energy, differ- 
ence variance, entropy, and homogeneity. Thus the 
dimension of GLCM feature is 16 x 12 = 192. 
Discrete Wavelet Transform 

Wavelet transform [38] has some nice features of space- 
frequency localization and multi-resolutions. Let L 2 (R) 
denote the vector space of a measurable, square integr- 
able, one-dimensional function. The continuous wavelet 
transform of a ID signal fit) e L 2 (R) is defined as 

(Waf)(P) = I f(t)4>a,b(t)dt (2) 



where the wavelet basis function (p a> b (t) e L 2 (R) can 
be expressed as 

0a,b(t)=0~*0(— ) 0) 
a 

These basis functions are called wavelets and have at 
least one vanishing moment. The arguments a and b 
denote the scale and location parameters, respectively. 
The oscillation in the basis functions increases with a 
decrease in a. Eq. (2) can be discretized by restraining a 
and b to a discrete lattice {a = 2 n , b e Z). Typically, 
there are some more constraints on (p when a non- 
redundant complete transform is implemented and a 
multiresolution representation is pursued. 

The wavelet basis functions in Eq.(3) are dilated and 
translated versions of the mother wavelet q> (t). There- 
fore, the wavelet coefficients of any scale (or resolution) 
could be computed from the wavelet coefficients of the 
next higher resolutions. This enables the implementa- 
tion of wavelet transform using a tree structure known 







VI j 










CG1258 


CG3733 


CG3938 


CG7922 CG8114 


1 


1 * » 


■ 






CG10873 CG12284 CG17161 CG8222 CG9484 

Figure 6 RNAi image set of fluorescence microscopy images of fly cells (D. melanogaster). 



Zhang and Pham BMC Bioinformatics 2011, 12:128 
http://www.biomedcentral.eom/1 471 -21 05/1 2/1 28 



Page 10 of 13 



as a pyramid algorithm [38]. Here, the wavelet trans- 
form of a ID signal is calculated by splitting it into two 
parts, with a low-pass filter (LPF) and high pass filter 
(HPF), respectively. The low frequency part is split again 
into two parts of high and low frequencies. And the ori- 
ginal signal can be reconstructed from the DWT coeffi- 
cients. The DWT for two dimensional images x[m, n] 
can be similarly defined by implementing the one 
dimensional DWT for each dimension m and n sepa- 
rately: DWT n [DWT m [x[m } «]]]. 2D Wavelet transform 
(WT) decomposes an image into "subbands" that are 
localised in frequency and orientation. A wavelet trans- 
form is created by passing the image through a series of 
filter bank stages. 

There are several ways of generating a 2D wavelet 
transform. The construction of the digital filters differs 
mainly in their scaling and wavelet coefficients. Scaling 
and wavelet function coefficients are characteristic of 
their particular families. In the following, well use Dau- 
bechies D4 for image decomposition [38]. The Daube- 
chies (D4) transform has four wavelet and scaling 
coefficients. In our application, two levels of resolution 
were extracted for each wavelet. At each resolution 
level, the wavelet has three detail coefficient matrices 
representing the vertical, horizontal and diagonal struc- 
tures of the image. From each of the detail coefficient 
matrices, the first-order and second-order statistics 
mean and standard deviation were calculated as image 
features. 
Gabor Wavelet 

Gabor wavelets [39,40] are often used to construct spec- 
tral filters for segmentation or detection of certain 
image texture and periodicity characteristics. In [15,16], 
Gabor wavelet has been used as one of the main fea- 
tures for image representation of cell microscope 
images. The convolution kernel of Gabor filter is a pro- 
duct of a Gaussian and a cosine function, which can be 
characterized by a preferred orientation and a preferred 
spatial frequency: 

x' 2 + yy' 2 x' 
8x,eA x > y) = ex P( ^T~^ cos(2tt- + <p) W 

where 

x' = x cos 6 + y sin 0 
y' = — x sin 0 + y cos 0 

The standard deviation a determines the effective size 
of the Gaussian signal. The eccentricity of the convolu- 
tion kernel g is determined by the parameter A, called 
the spatial aspect ratio. A determines the frequency 
(wavelength) of the cosine. 6 determines the direction of 
the cosine function and finally, *0 is the phase offset. 



Typically, an image is filtered with a set of Gabor fil- 
ters of different preferred orientations and spatial fre- 
quencies that cover appropriately the spatial frequency 
domain, and the features obtained form a feature vector 
that is further used for classification. Given an image / 
(x, y), its Gabor wavelet transform is defined as 



w mn (x, y) = j J 0i/ yi)£mn(*-*i' y-yi)dxidy x (5) 

where * indicates the complex conjugate. We assume 
the local texture regions are spatially homogeneous. The 
mean pi mn and standard deviation o mn of the magnitude 
of transform coefficients are used to represent the 
regions for classification: 



fir, 



W mn (x, y)\dxdy 



// 



{\W mn {x,y)\ - fi mn ) 2 dxdy 



(6) 



(7) 



The Gabor feature vector contains pairs for all the 
scales and orientations of the wavelets. From a number 
of experiments we found that a filter bank with six 
orientations and four scales gave the best classification 
performance, which means 24 x 2 component features 
will be extracted for a given image patch. Therefore, the 
figuration is applied to 6 x 8 non-overlapping image 
subregions each with the size 60 x 64, yielding overall 
feature vector with length 4 x 5 x 48 = 960 for each 
image. 

Curvelet Transform 

Curvelet transform is a multiresolution geometric analy- 
sis proposed by Candes and Donoho [25] for the pur- 
pose of overcoming the drawbacks of conventional two- 
dimensional discrete wavelet transforms of lacking good 
representational capability for direction selectivity. The 
idea is to represent a curve as superposition of functions 
of various length and width obeying the curvelet scaling 
law width ~ length 2 [25\. Figure 7 presents the curvelet 
analysis method. 

The needle-shaped elements of curvelets shown in 
Figure 7 possess very high directional sensitivity and 
anisotropy, which is quite different with the isotropic 
elements of wavelets [27]. Such elements are very effi- 
cient in representing line-like edge. Comparing the cur- 
velet system with the conventional Fourier and wavelet 
analysis can further help our understanding. The short- 
time Fourier transform uses a shape-fixed rectangle in 
Fourier domain, and conventional wavelets use shape- 
changing (dilated) but area-fixed windows. By contrast, 
the curvelet transform uses angled polar wedges or 
angled trapezoid windows in frequency domain in order 
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Figure 7 Curvelet basic digital tiling. The shaded region 
represents one such typical wedge. 



to resolve also directional features. The curvelet trans- 
form coefficients can be expressed by 

c{j f h k):= (f, cpjxk) = / f{x){<pj f i f k{x)dx (8) 

where <fi jj^ denotes curvelet function, and /, / and h 
denotes the variable of scale, orientation, and position 
respectively. In the frequency domain, the curvelet 
transform can be implemented with <fi by means of the 
window function Defining a pair of windows W (r) (a 
radial window) and V (t) (an angular window) as below: 

oo 

J2 W 2 (2V) = 1, re (3/4,3/2) (9) 

)=— 00 

00 

J2v 2 {t-l) =1, te (-1/2, 1/2) (10) 

)= — OO 

where variables W as a frequency domain variable, and 
r and 6 for each j > ; 0 , Uj is defined in the Fourier 
domain by 

2^6 , x 

l/ ; (r, 0) = 2 3 ^co{2-W)v{ ) (11) 

where [j/2] denotes the integer part of j/2. 

From the curvelet coefficients, a common way to con- 
struct image descriptor is via some statistics calculated 
from each of these curvelet sub-bands. For example, the 
mean ft and standard deviation 3 are the most convenient 
features that have been proven efficient in applications 



like image retrieval and face recognition [29-31]. If n cur- 
velets are used for the transform, 2n dimensional feature 
vectors G = [G^, G s ] are obtained, where G^ = [ft lf ♦ • ♦ 
, ft n ], G s = [Si, S 2 , . . . , 8„]. 

In this paper, we applied the second generation dis- 
crete curvelet transform [26], which is implemented in 
four steps: (1) using 2-D fast Fourier transform for the 
image; (2) forming the product of the scale and angle 
windows; (3) wrapping the aforesaid product around the 
origin; and (4) applying the 2-D inverse fast Fourier 
transform. The fast discrete curvelet transform via 
wedge wrapping was applied using the CurveLab Tool- 
box http://www.curvelet.org/ in the MATLAB develop- 
ment environment. Two parameters are involved in the 
implementation: number of resolutions and number of 
angles at the coarsest level. For the images of 2D HeLa, 
CHO and RNAi, five scales were chosen which include 
the coarsest wavelet level. At the 2nd coarsest level 16 
angles were used. With 5 levels analysis, 82(= 1 + 16 + 
32 + 32 + 1) subbands of curvelet coefficients were 
computed. Therefore, a 164 dimension feature vector 
was generated for each image in the three image sets. 
Combined Features from GLCM and Curvelet Transform 
Each feature extracted from above different methods 
characterizes certain aspect of image content. The joint 
exploitation of different image descriptions is often 
necessary to provide comprehensive discriptions in 
order for a classification system with higher accuracy. 
One of the difficulties of multiple feature aggregration 
lies in the high dimensionalities of the image features. 
However, with Random Subspace classifier ensemble 
which will be elaborated in the following, this problem 
is implicitly resolved due to its dimension reduction 
capability. Since the values of GLCM and curvelet fea- 
tures assume different ranges, first we normalize them 
in the range [-1, 1] before combining them in a single 
vector. 

Random Subspace Classifier Ensemble 

A classifiers ensemble means a set of individually 
trained classifiers is integrated appropriately based on 
their component decisions [34,35]. Classifier ensembles 
generally offer improved performance due to the com- 
plementary information provided by the constituent 
classifiers. In this study, we consider artificial neural net- 
work as the base learners. The generalization perfor- 
mance of neural networks is not very stable in the sense 
that different settings such as different network architec- 
tures and initial conditions may all influence the learn- 
ing outcome. The existing of such differences between 
base classifiers is a necessary condition to constitute 
classifier ensemble [34]. Multi-layer perceptron (MLP) 
has been successfully applied to the classification of sub- 
cellular protein location patterns [18,19] and the 
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performance improvement can be expected from an 
MLP ensemble. 

There are many ways to construct a classifier ensemble. 
In this paper we focus on MLP ensembles based on differ- 
ent feature subsets, following the principle of Random 
Subspace method (RSM) proposed in [37]. Being different 
with Bagging [49] and Boosting [50], not training samples 
but the feature variables from curvelet transform and 
GLCM are resampled, i.e., a large number of individual 
MLP models are trained on randomly chosen subsets of 
all available features {i.e. random sunspace). In another 
words, by dividing the entire space of features into sub- 
spaces, ensemble of MLP classifiers is created with one 
base classifier trained on each subspace from randomly 
selecting features from the entire space. The details of RS 
ensemble can be further elaborated as in the following. 

Input: a d-dimensional labeled training data set 

(1) Select a random projection from the ^-dimen- 
sional feature space to a /^-dimensional subspace; 

(2) Project the data from the original ^-dimensional 
feature space into the selected /^-dimensional 
subspace; 

(3) Train an MLP classifier on the acquired k- 
dimensional feature; 

(4) Repeat steps 1-3 m times to obtain m different 
subspaces for the ensemble individuals; 

(5) Aggregate the individual classifiers by majority 
voting. 

Availability and Requirements 

The reviewers can access the Matlab code, which has 
been included with the manuscript as additional file 1. 

Additional material 



Additional file 1: Matlab code. Matlab program used in the 
experiment, including feature extraction, different classifers comparison, 
and implementation of the random subspace ensemble, with the 
example of RNAi data. 
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